Performance Analysis of Deep-Neural-Network-Based Automatic Diagnosis of Diabetic Retinopathy

Diabetic retinopathy (DR) is a human eye disease that affects people who are suffering from diabetes. It causes damage to their eyes, including vision loss. It is treatable; however, it takes a long time to diagnose and may require many eye exams. Early detection of DR may prevent or delay the vision loss. Therefore, a robust, automatic and computer-based diagnosis of DR is essential. Currently, deep neural networks are being utilized in numerous medical areas to diagnose various diseases. Consequently, deep transfer learning is utilized in this article. We employ five convolutional-neural-network-based designs (AlexNet, GoogleNet, Inception V4, Inception ResNet V2 and ResNeXt-50). A collection of DR pictures is created. Subsequently, the created collections are labeled with an appropriate treatment approach. This automates the diagnosis and assists patients through subsequent therapies. Furthermore, in order to identify the severity of DR retina pictures, we use our own dataset to train deep convolutional neural networks (CNNs). Experimental results reveal that the pre-trained model Se-ResNeXt-50 obtains the best classification accuracy of 97.53% for our dataset out of all pre-trained models. Moreover, we perform five different experiments on each CNN architecture. As a result, a minimum accuracy of 84.01% is achieved for a five-degree classification.


Introduction
Diabetic retinopathy (DR) is a human eye infection in people with diabetes. It is initiated due to retinal vascular damage, which is caused by diabetes mellitus for a longduration [1]. This disease is one of the most common reasons behind blindness [2]. Therefore, its detection in the early stages is critical [3]. There are many treatments for this disease; however, they take plenty of time and may even include many eye tests such as photo-coagulation and vitrectomy [4].
According to a survey in Europe, almost 60 million people are diabetes patients and they are most prone to DR. In the United States, 10.2 million people with an age of 40 or above have diabetes. Furthermore, 40% of these people are at risk of some visionthreatening disease [5]. Moreover, the survey of the Center for Disease Control in 2020 revealed that 3.3 million people are suffering from DR [6]. According to the World Health Organization, diabetes has affected 422 million people to date and this number will become 629 million by 2045 [7,8].
DR is normally categorized into five different groups: Normal-0, Mild-1, Moderate-2, Severe-3 and Proliferative-4 as listed in Table 1. The disease starts with small changes in the blood vessels of the eyes, which could be labeled as Mild DR. Concerning the case of Mild DR, the patient could defeat this disease and complete recovery is possible. If this condition of the disease is left untreated, then it will convert into Moderate DR. The leakage in the blood vessels may start in the case of Moderate DR. In the next stage, if the disease increases further then it changes to Severe and Proliferative DR and it could cause complete vision loss. Table 1. Different stages of diabetic retinopathy with the passage of time [9]. High bleeding and the formation of new blood vessels elsewhere in the retina. Complete blindness.

Stage
The current detection of DR is made through a dilated eye exam in which the doctors put some eye drops into the patient's eyes. Subsequently, an image of the eye is taken with the help of various medical instruments. This technique is manual and therefore there are always some errors in diagnosis. Another way of detecting DR is examining through ophthalmoscopy. In one study, approximately 16% of patients were diagnosed as DR patients using ophthalmoscopy in respect of 442 right eyes [10].
Image processing is also used to identify DR based on highlights; for example, veins, radiates, hemorrhages and small-scale aneurysms. During this process, digital fundus cameras are used to obtain accurate eye images. Techniques like image enhancement, fusion, morphology detection and image segmentation help medical doctors to obtain more information from the data of medical images [10]. In the case of DR, people are not aware of the disease unless a manual detection is made. Due to the lack of related treatment, according to the specific level of the disease, chances of losing eyesight may increase [11].

State-of-the-Art on DR Detection Dsing Deep Learning Techniques
Numerous techniques have been proposed to detect DR. This section focuses on multiclass classification using deep learning and neural network techniques. Some studies have classified the fundus images into two categories: diabetic, which includes average to extreme conditions of non-proliferative DR; and non-diabetic, where the person is not affected with DR) [12]. Based on this, they proposed a technique to accurately appoint the class where a fundus image could be labeled, utilizing one principal classifier and back propagation neural organization (BPNN) procedures.
Similarly, a deep-learning-based method has been proposed to classify the fundus photographs for human ophthalmologist diagnostics. Authors have built a novel Siameselike CNN (convolutional neural network) binocular model based on Inception V3 that can acknowledge fundus pictures of both eyes and yield the output of each eye at the same time [13]. A hybrid approach for diagnosing DR has been proposed that uses histogram equalization (HE) and contrast limited adaptive histogram equalization (CLAHE) to assist the deep learning model [14]. It provides more accentuation and effectiveness by way of the intelligent enhancement of the image during the diagnosis process. The authors exploited five CNN architectures to evaluate the performance parameters for the dataset of DR patients. Their classification methodology classifies images into three different groups based on the condition of the disease [15].
The authors developed a novel ResNet18-based CNN architecture to diagnose DR patients. This approach helps in solving a strong class imbalance problem and generates region scoring maps (RSMs) [16]. Furthermore, it indicates the severity level by highlighting the semantic regions of the fundus image. The authors proposed a technique only for the detection of DR regardless of the severity of DR. They classified images as normal and abnormal for the targeted dataset [17]. Similarly, the authors proposed a deep-learningbased CNN to classify a small dataset of DR images, using Cohen's kappa as an accuracy metric [18].
In addition to the aforementioned research works, many datasets of fundus images have been developed for DR-related diagnoses. For example, TeleOphta uses a tele-ophthalmology network for diabetic retinopathy screening [19]. Other examples are Digital Retinal Images for Vessel Extraction (DRIVE) and Structured Analysis of the Retina (STARE), which are used to segment the vessel network using local and global vessel features [20,21]. Similarly, the SVM (support vector machine) provides 95% and Bayesian provides 90% accuracy [11]. In this technique, images are segmented, outliers are detected, image analysis is performed and the brightness is controlled. In an another technique, SVM provides 86% accuracy and KNN (K-nearest neighbor) provides 55% accuracy [22]. In KNN, images are clustered with the help of pixel clusters. The fundus image mask is removed with the help of pixel clustering [22].
There is another technique known as the extreme learning machine (ELM) design for detecting a disease in eye blood vessels. This technique is mainly used for the detection of diseased blood vessels. Some of the blood vessels are injured in diabetic retinopathy. In this technique, an image is provided to the ELM. The provided algorithm calculates the grayscale value and chooses some features that provide more information than other pixels. Consequently, researchers can achieve 90% accuracy [10]. Similarly, the authors analyzed various blood vessel segmentation techniques in [23,24]. They further identified the lesions for the detection of diabetic retinopathy. The results were compared with the neural network technique.
Finally, by integrating microaneurysms, haemorrhages and exudates, the authors described a method for detecting non-proliferative diabetic retinopathy [25]. They developed a novel convolutional layer that automatically determines the number of extracted features. Each category is then placed into different folders so that there exist a small number of patches for the model to process at runtime. Subsequently, six convolutional layers are added to the model to obtain a validation accuracy of 72% and a training accuracy of 75% .

Research Gap
Although pre-trained CNNs have been used previously for different diseases, there is a need to enhance the accuracy of classification using a custom dataset and deep transfer learning. A dataset composed of low-resolution DR images, as employed in the conventional methods of Section 1.1, may cause low accuracy or incorrect classification. At the same time, a high-risk patient in the proliferate category requires immediate cure and diagnosis. Keeping this view, the diagnosis procedure requires high accuracy with adequate images of the posterior pole. In a nutshell, there should be an efficient, immediate and autonomous method that can recognize retinopathy with accurate outcomes. This implies that there should be a methodology to evaluate the classification performance parameters on recent CNN architectures.

Contributions
In this article we propose a methodology to classify the DR images using five different pre-trained CNNs. The contributions are summarized in the following points:

•
Our proposed methodology is flexible and automatically detects the classified pictures of patients with a higher accuracy. It classifies the dataset based on the severity of the disease in different stages/categories. Moreover, it helps doctors to select one or more CNN architectures for the diagnosis. • We have analyzed the robustness of CNN architectures on our constructed (customized) dataset for the diagnosis of DR patients. A brief description of the customized dataset is provided in Section 1.4. It highlights how both CNN and dataset directly or indirectly affect performance evaluation. It implies that deep transfer learning techniques have been used with some pre-trained models and customized datasets to obtain high-accuracy results. • We have also analyzed how the previously made architectures will perform on our dataset and how these architectures can be fine-tuned to obtain the best results on our dataset. • To the best of our knowledge, the proposed work in this article is the first effort to consider the evaluation of recent CNNs, using a customized dataset.
The objective is to provide accurate and less time-consuming results (as compared to the manual methods) by applying different deep neural network algorithms for the classification of different eyes infected by the illness. This helps to obtain more information from the classified images. Consequently, doctors will be able to detect diabetic retinopathy levels more accurately.

Customized Dataset for Performance Evaluation
The classification accuracy of the DR mainly depends upon the size of the dataset. This implies that a higher accuracy requires a huge amount of training data using a machine learning algorithm. Moreover, the data should be collected from reliable sources with accurate tags. The following datasets are most widely used for DR detection: Digital Retinal Images for Vessel Extraction (DRIVE) dataset [20], Structured Analysis of the Retina (STARE) dataset [21], E-ophtha dataset [19] and Kaggle Diabetic Retinopathy dataset [26,27].
In this study, we created our custom dataset as explained in Section 4.1. The created dataset was built from different resources which are based on different severity levels. It also includes EyePacs [26], which has collected approximately 5 million images from 75,000 patients. Another dataset from Kaggle, which consists of 53,594 images for testing and 35,126 images for training, is also available for analysis. The Kaggle dataset includes a significant number of pictures (72,743) from DR patients. Furthermore, it has pictures for all DR categories in a single folder. Moreover, it also contains categories of various images and their descriptions in the form of comma separated value (CSV) files.
The corresponding enhancements and preprocessing of data are explained in Section 2.1, where all the images are oriented, resized and horizontally flipped. Moreover, the intensity of images is also enhanced.Furthermore, an augmentation is performed where all the images are made consistent in terms of size and intensity. The aforementioned enhancements and preprocessing techniques help CNN for the robust classification. Based on the aforementioned databases, we constructed a dataset of 5333 images, where 1421 are normal, 954 are mild, 1210 are moderate, 308 are severe and 1440 are high-risk patients (see Section 4.1).

Organization
The organization of this paper is as follows. The proposed methodology is described in Section 2. Section 3 explains the pre-trained CNN architectures and different performance matrices used in the results. Section 4 reports the results and implementation in light of the proposed methodology. The article is concluded in Section 5.

Proposed Approach
The proposed methodology is illustrated in Figure 1. The entire process comprises five steps. First, the retina pictures are pre-processed and supplemented using pre-trained models. Deep transfer learning (DTL) is then used during the training phase. During classification, feature extraction and precise prediction of models are employed. The retina prediction is made using a machine learning algorithm. Subsequently, it is classified into five different groups based on the severity of DR as described in Table 1. The following steps are involved during the prediction process: dataset, data preprocessing, model setup and evaluation. In the dataset, the method of data generation for training and testing purposes is described. In data pre-processing, the pipeline for bringing the pictures from various sources is portrayed. Similarly, the model setup describes multiple convolution layers for the classification of images. Finally, the results are evaluated and analyzed.
The data were collected from different resources to construct a new dataset. Furthermore, Python visualization libraries were used to visualize our data [27]. The proposed method in this article employs deep neural networks and a supervised learning architecture (CNN) for image detection. The supervised learning is used for model training. After model training, the sample data are tested and verified with the given training data. Moreover, some evaluation techniques are applied for the classification of results. After executing classification techniques, results are classified on the basis of training data. Finally, the model accuracy is measured in comparison to training data.

Pre-Processing and Enhancement of DR Dataset
Actual pre-trained CNN models are too large to handle the retina images dataset, resulting in overfitting issues. To address this problem, a variation can be introduced to the dataset. Adding variation at the early point (input) of a neural network causes significant changes in the dataset generalization. A variation refers to the fact that the noise addition task augments the dataset in some way. The dataset constraint is one of the critical challenges faced by researchers in the healthcare field. As a result, we have employed some additional augmentation approaches. The retina image dataset was created as follows. After resizing the photos to 224 × 224 × 3, we used the following augmentation methods: random horizontal flip (aids in the detection of DR based on severity level), random resized crop (the last stage of DR, i.e., proliferate) and, last, picture enhancement by altering picture intensities.

CNN Architecture
Deep neural networks based on CNN models have recently been employed to handle computer vision challenges. To categorize the DR dataset among normal and various levels of DR patients, we employed deep-CNN-model-based AlexNet [28], GoogleNet [29], Inception V4 [30], Inception ResNet V2 [31] and ResNeXt-50 [32] models, as well as transfer learning approaches. Transfer learning may also aid with class imbalance and model execution time. The employed CNN models, as well as AlexNet, GoogleNet, Inception V4, Inception ResNet V2 and ResNeXt-50 models, are presented schematically in Figure 2. Pre-trained models work quite well on a new dataset before being used for classification. The DTL is a useful approach for solving the issue of unfit training data. The goal of this strategy is to extract the information from a process (issue). The extracted information is then utilized over comparable tasks by overcoming isolated learning issues. This understanding provides an incentive to tackle the problems in a variety of disciplines where the development is hard. It has resulted in insufficient or partial training data. Figure 3 depicts the DTL process.  We utilize nine pre-trained architectures to deal with the retina image dataset, rather than using the long training process from scratch. The weights of existing pre-trained model layers are re-used for model training in a different domain, as illustrated in Figure 4. The DTL methodology has yielded beneficial and significant achievements in a variety of computer vision areas [33][34][35][36]. We used CNN architectural weights that had already been learned. Moreover, the entire model was fine-tuned with some appropriate learning rates.

Pre-trained weights
Learnable weights

Pre-Trained CNN Architectures and Performance Matrices
We selected five distinct pre-trained CNN architectures: AlexNet [28], GoogleNet [29], Inception V4 [30], Inception ResNet V2 [31] and ResNeXt-50 [32]. These models are used to classify the DR image dataset. In order to modify the classification layer, fine tuning is employed. The fine-tuning process extracts features for the targeted tasks. Since pre-trained models are utilized, only the previously diagnosed diabetic retinopathy images are used to make the models more accurate. The model training process is given as follows: • Load the pictures from every type of folder. In the end, we can visualize loss and accuracy.

AlexNet Architecture
AlexNet is the name of a convolutional neural network that has made a significant contribution to the field of machine learning. This is particularly true for in-depth learning in machine vision. The AlexNet architecture has 5 layers of flexibility, 3 layers of merging, 2 layers of standardization, 2 fully connected layers, and 1 softmax tile. The convolutional filters and the nonlinear activation function ReLU are included in each convolutional layer. Blending layers are used to create a variety of combinations. Due to the existence of completely linked layers, the input size is modified. Convolutional neural networks are a key component of neural networks. They are made up of neurons with a readable weight and bias. Each specific neuron receives a number of inputs. Subsequently, it takes a weight-bearing amount on top of it. Finally, it is transmitted by activating, turning and releasing. The complete architecture of AlexNet is illustrated in Figure 5.

GoogleNet Architecture
GoogleNet is a 22-level deep congenital neural network. Its salient feature is to work very fast. It has less memory usage and less power consumption. This neural network utilizes the averaged value of global pooling and maximum pooling. For our pre-trained model, it consists of four parallel paths. The inception blocks perform a convolution (1 × 1, 3 × 3, 5 × 5 window sizes) for spatial sizes and information extraction. The ReLU is also included in the convolution layer. The inception block is utilized three times. The first two inception blocks are used for 3 × 3 maximum pooling, while the third is used as a global average pool linked by a thick layer. The complete architecture of GoogleNet is illustrated in Figure 6.

Inception V4 Architecture
Concerning the deep CNN architectures, Inception is considered for a good performance with a low execution cost. It was initially introduced in [31] as Inception v1. Then, this architecture was improved with the concept of batch normalization to a new variant named Inception v2. Next, factorization was introduced during iterations to form different variants, i.e., Inception v4, Inception ResNet V1 and Inception ResNet V2. Inception v4 is a slightly modified version of Inception v3. The Inception model and Inception ResNet model are residual and non-residual variants; this is the main difference. Moreover, batch normalization is only used on top of the traditional layer rather than residual summations. The architecture of Inception v4 consists of the initial set of layers that were modified to make it uniform. This is referred to as the "stem of the architecture" and is used in front of the Inception block in the architecture. This does not require the partition of the replicas, which enables a training feature. However, the previous versions of the Inception architecture require a replica to fit in the memory. This also reduces the memory requirement because it uses memory optimization during backpropagation. In our paper, we use Inception v4 and Inception ResNet V2. The explanation of Inception ResNet V2 is given in the next section. The complete architecture of Inception V4 is illustrated in Figure 7.

Inception ResNet V2 Architecture
Inception ResNet V2 is a decisive neural structure built into the Inception family of architectures. It incorporates residual connections (changes the filter concatenation stage of Inception construction). It has an ability to split images into 1000 objects, e.g., mouse, keyboard, pencil. The network has a read rich property to accept presentations of various images. The network has an input image size of 299 × 299. The output is a vector form of measurable probability. The complete build of the network is based on a combination of the original structure and the remaining connections. Moreover, multiple heavy filters are integrated with the remaining connections. The use of residual connections not only prevents degradation (caused by deep structures) but also reduces the training time. The complete architecture of Inception ResNet V2 is illustrated in Figure 8.

ResNeXt-50 Architecture
ResNeXt-50 uses a squeeze and excitation (SE) block for each non-identity branch of a residual block. It comprises 5 different sections, including convolution and identity blocks. A single convolution block has three layers of convolution and each ID has 3 stages of conversion. The SE block acts as a computational unit that performs transformations from inputs to feature maps. It can be attached with different CNN architectures and residual networks. The SE block is placed before summation, which increases the computational cost. However, it enables ResNext-50 to achieve a higher accuracy as compared to ResNet-50. The complete architecture of Inception ResNet V2 is illustrated in Figure 9.

Performance Matrices
All of the previously mentioned CCNs were utilized in our experiments. We considered five parameters to evaluate the aforementioned CNN architectures to classify the retina images. All these parameters were calculated using four important terms from the confusion matrix, which are True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). Therefore, the corresponding values for these parameters (accuracy, error rate, precision, recall and Fscore) are given in Equations (1) and (4)-(6), subsequently.
The accuracy of the classifier depends on different parameters, as given in Equation (1). Moreover, the rate of sensitivity interprets the ability of a classifier to correctly form the target class, as given in Equation (2). Similarly, the rate of specificity illustrates the capability of a classifier for separation, as shown in Equation (3). The precision rate evaluates the determination of a certain class. Finally, FScore is the harmonic mean sensitivity (recall) and accuracy value as set forth in Equation (6). The analytical average error value may be determined using Equation (5). In our research, all associated evaluation parameters for CNNs were computed. Consequently, the findings are presented in the next section based on the above parameters.

Results and Implementation
This section provides the description of the proposed custom dataset, experimental setup and obtained results.

Creation of Custom Dataset
We created our custom dataset of fundus images to grade the severity level of DR. The proposed approach contrasts with the existing grading (as mentioned in Section 1.1), which grades fundus images based on the pathological changes in the retina. In addition to this, we consider the clinical practice; that is, we categorize a fundus picture of the foundation of abnormalities and the treatment technique. For training and testing, the pictures are divided and placed in different files. A custom script is created to determine the kind of picture based on its tags. The pictures are then cropped and the essential characteristics are separated. Furthermore, a filtering technique is employed to equalize and contrast the picture modification. To increase the variety of data, data augmentation is used. Finally, flipping, cropping and padding are performed. To summarize, the created dataset comprises 1440 images for positive DR patients (high-risk).

Experimental Setup
We developed some fine-tuned CNN architectures to classify DR pictures. These architectures are AlexNet, GoogleNet, Inception V4, Inception ResNet V2 and ResNeXt-50. Each CNN architecture uses fully connected (FC) layers with a classification criticality of the final FC layer. The number of neurons in the final FC layer is calculated using the target dataset. It is necessary to set and optimize these parameters as the CNN architecture itself is not able to define parameters for the fine-tuning method. Therefore, the parameters are defined using the results of training for the improvement of performance. We utilized the Adam optimizer for the training of every network architecture with 30 epochs (maximum). The batch size and initial learning rate for training and testing are 32, 8 and 1e-5, respectively. We used Python (a programming language) to train the CNN models. All the experiments were executed on an NVIDIA GPU (NVIDIA CUDA Version: 10.1 with Tesla P100) using a Google Colaboratory. PyTorch version 1.5 was utilized to execute experimentation on pretrained CNN models (AlexNet, GoogleNet, Inception V4, Inception ResNet V2 and ResNeXt-50) using weights (This is a random value of initial weights for our pre-trained CNN architectures). The aforementioned CNN architecture takes advantage of hyperparameters for DTL, as shown in Table 2.

Results and Analysis
The proposed methodology in this work was evaluated using the performance metrics (see Section 3.6), which were calculated during experiments. This methodology also explores the fine-tuning of DTL, which includes the extraction of the features from pretrained CNN networks. The experimental study was conducted using our custom dataset which is completely based on 4 publicly available datasets as described in Section 1.4. The complete experimental process was based on five pre-trained CNN networks, i.e., AlexNet, GoogleNet, Inception V4, Inception ResNet V2 and ResNeXt-50.
The process starts by evaluating the accuracy for a multi-class dataset. Next, k-fold is utilized to evaluate the average classification accuracy. The value of the average accuracy is calculated using the values of individual accuracy. The results illustrate the exploration of the DTL method using feature extraction on pre-trained CNN networks. The results are listed in Table 3 using our custom dataset (The performance of the experiments was obtained using finetuned and pre-trained architectures for all k-folds). It is noteworthy that the CNN classifies the images and reports a confusion matrix for each severity level of disease.  Table 3 for different k-folds indicates the accuracy value for the employed pre-trained models. The most accurate ResNeXt-50 model is 97.53% for fold-5 (fold-1 and fold-2 also reach the same accuracy). It is clear that the accuracy of the model increases when we increase the value of the fold. Our unique data package achieves the highest precision of 95.98% for fold-5 and the lowest precision of 84.01% for fold-1 for AlexNet models. The same precision is achieved by fold-3, fold-4 and fold-5, for Alexnet and GoogleNet. As regards the maximum individual accuracy of the pretrained models, we have AlexNet: 87.84%, GoogleNet: 91.16%, V4: 90.31%, ResNet V2: 91.61% and ResNeXt-50: 97.53%. For all AlexNet and GoogleNet models pretrained we observe that V4, ResNet V2 and ResNeXt-50 achieve comparable accuracy after fold-3; GoogleNet after fold-1, changing after fold-5; ResNet V4 after fold-1, changing after fold-5; ResNet V2 for fold-2 and fold-3; and ResNeXt-50 for fold-1, fold-2 and fold-5.

Comparison and Discussion
The comparison of our evaluation with the recent publications is presented in Table 4. We found two recent publications and analyzed our implementation results. The authors in [15] evaluated Alexnet and ResNet architectures using a small dataset. This collection includes 4476 pictures from three clinical departments of the Sichuan Provincial People's Hospital. The authors in [11] implemented Inception V4 and ResNeXt-50 architectures with a dataset that is moderate in size. They utilized a pre-processing pipeline that converts a set of fundus images into a uniform format. They used a modified version of the Inception-V3 network. Subsequently, the performance is compared with several mainstream CNN models. The authors in [11] classified the dataset into five different classes with AlexNet and ResNet. The utilized dataset of 35,126 images exercised three types of ensembles.
In our work, we implemented several mainstream CNN architectures and evaluated five different parameters with DTL using our custom data. The authors in [24] exploited a modified version of Lenet-5 architectures that is quite old. However, in our study, we exploited recent CNNs. Regarding evaluation scores, the proposed methodology efficiently classifies images in different classes as DR, normal, mild, moderate, severe and high-risk. From our analysis, we conclude that the fine-tuning of a pre-trained CNN architecture with DTL could be employed as one of the efficient techniques in the medical field for the classification of DR images.
It is noteworthy that high-risk DR patients lie in the proliferate category. They require an immediate cure and diagnosis. In our diagnosis method, we exploit DR images of patients that reflect the posterior pole. Using a high resolution ultimately elevates the size of the dataset, which may increase the execution time for the classification. However, the use of low-resolution DR images could affect classification due to the lack of clear media. In this research work, we used high-resolution datasets from different sources as mentioned in Section 1.4. To handle this issue, we executed pre-processing of data where the DR images were resized and augmentation was performed for the picture enhancement. Last but not least, there is still a possibility of an error during classification that could be reduced, but the accuracy of the dataset reflects the correctness of the diseases. From the comparison of the state-of-the-art, we conclude that our datasets achieve better accuracy and this was the main goal of this research work.

Conclusions and Future Work
In this article we have provided a deep-transfer-learning technique, based on convolutional neural networks, for the categorization of diabetic retinopathy patients. In order to investigate the proposed deep-transfer-learning technique, five pre-trained convolutional neural network models were employed. It was observed that the fine tweaking of pre-trained models may be used effectively on a multi-class dataset. As a result, the diagnosis efficiency for diabetic retinopathy patients has been improved. Across all the five employed models, the ResNeXt-50 architecture achieved a maximum accuracy of 97.53 % for our dataset. Our high-accuracy findings can assist doctors and researchers in making clinical judgments. Our work includes a few limitations that can be addressed in future research. A more in-depth examination is needed, which requires more patient data. Future study should also focus on differentiating the accuracy of individuals with normal symptoms from those with non-proliferate symptoms. The non-proliferate symptoms may not be properly visible on retina images, or may not be visualized at all. Another probable direction is to apply the proposed method to larger datasets. It may address other medical problems such as cancer and tumors, as well being applicable in other computer vision industries such as energy, agriculture and transportation.